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(54) Memory allocation in a multithreaded environment 



(57) A method of allocating memory ( 1 6) in a multi- 
threaded (parallel) computing environment in which 
threads (30-33) running in parallel within a process are 
associated with one of a number of memory pools (24, 
38-40) of a system memory. The method includes the 
steps of establishing memory pools in the system mem- 



ory, mapping each thread to one of the memory pools: 
and. for each thread, dynamically allocating user mem- 
ory blocks from the associated memory pool. The meth- 
od allows any existing memory management ma Hoc 
(memory allocation) package to be converted to a mul- 
tithreaded version so that multithreaded processes are 
run with greater efficiency. 
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Description 

Background of the invention 

s The invention relates to memory allocation and more particularly to memory allocation in a multithreaded (parallel) 

environment. 

In allocating memory for a computer program, most older languages (e.g., FORTRAN, COBOL) require that the 
size of an array or data item be declared before the program is compiled. Moreover, the size of the array or data item 
could not be exceeded unless the program was changed and re-compiled. Today, however, most modern programming 

10 languages, including C and O*. allow the user to request memory blocks from the system memory at run-time and 
release the blocks back to the system memory when the program no longer needs the blocks. For example, in these 
modern languages, data elements often have a data structure with a field containing a pointer to a next data element. 
A number of data elements may be allocated, at run-time, in a linked list or an array structure. 

The C programming language provides memory management capability with a set of library functions known as 

is "memory allocation" routines. The most basic memory allocation function is called malioc which allocates a requested 
number of bytes and returns a pointer that is the starting address of the memory allocated. Another function known as 
free returns the memory previously allocated by malioc so that it can be allocated again for use by other routines. 

In applications in which memory allocation occurs in parallel, for example, in a multithreaded process, the malioc 
and free functions must be "code-locked'. Code-locking means that the library code of the process containing the 

20 thread is protected with a global lock. This prevents data corruption in the event that one thread is modifying a global 
structure when another thread is trying to read it. Code-locking allows only one thread to call any of the malioc functions 
(e.g., malice, free, reailoc) at any given time with other threads waiting until the thread is finished with its memory 
allocation. Thus, in a multithreaded process in which memory allocation functions are used extensively, the speed of 
the system is seriously compromised. 

2S 

Summary of the Invention 

I n general, in one aspect, the invention is a method of allocating memory in a multithreaded computing environment 
in which threads running in parallel within a process each have an associated memory pool in a system memory. Th 

30 method includes the steps of establishing memory pools in the system memory, mapping each thread to one of th 
memory pools; and, for each thread, dynamically allocating user memory blocks from the associated memory pool. 
Each thread uses memory allocation routines (e.g., malioc) to manipulate its own memory pool, thereby providing 
greater efficiency of memory management. 

The invention converts an existing memory management malioc package to a multithreaded version so that mul- 

3S tithreaded processes are run with greater efficiency. Moreover, the invention is applicable to any application requiring 
memory management in parallel; in particular, those applications requiring significant parallel memory management. 
Furthermore, use of the invention is transparent from the application programmer's standpoint, since the user interface 
is the same as that of the standard C library memory management functions (i.e., malioc, free, reailoc). 

In a preferred embodiment, the method may further include the step of preventing simultaneous access to a memory 

40 pool by different threads. Having separate memory pools allows separate code-locking (e.g., mutex locking) to prevent 
simultaneous access to the memory pools by the different threads, thereby eliminating the possibility of data corruption. 
In existing standard memory allocation routines suitable for parallel execution, there is only a single code lock. Thus, 
only one thread can make a memory allocation routine call at any given time. All other threads running in the process 
must wait until the thread finishes with its memory allocation operation. In the invention, on the other hand, so long as 

45 each thread is manipulating its own memory, memory allocation operations can be performed in parallel without any 
delay. The separate code-locking feature only becomes important when a thread anempts to access the memory pool 
of another thread. Such memory allocations of a memory pool not associated with that thread are fairly uncommon. 
Thus, the invention provides an improvement in the performance of the multithreaded process by significantly reducing 
time delays associated with memory allocation routine calls. 

so Preferred embodiments may include one or more of the following features. The step of dynamically allocating 

memory blocks includes designating the number of bytes in the block desired to be allocated. For example, calling the 
malioc function will allocate any number of required bytes up to a maximum size of the memory pool. The step of 
establishing a memory pool for each thread may further include allocating a memory buffer of a preselected size (e. 
g., 64 Kbytes). In the ev nt that the size of the memory pool has been exhausted, the size of th m mory pool may 

55 b dynamically increased by allocating additional m mory from the system memory in increm nts equal to the prese- 
lected size of the buffer memory. Moreover, the method may furth r include allowing one f the threads to transfer 
memory from the mem ry pool of anoth r of the threads to its memory pool. 

Each memory pool may be maintained as a data structure of memory blocks, for example, an array of static var- 



2 



(19) 



J 



Eur p*isches Patentamt 
European Patent Offlc 
Office uxopden d s brav ts 



HIIDIIIIIIIIIIIlllll 

EP 0 817 044 A3 



(12) 



EUROPEAN PATENT APPLICATION 



(88) 


Date of publication A3: 
15.04.1908 Bulletin 1998/16 


(51) intciA G06F 9/46 




Oat a of r»i iKh/*ati/^n AO* 

Uaie oi puoncaiion m^. 
07.01.1908 Bulletin 1998/02 






(21) 


Application number: 97304653.5 






(22) 


Date of filing: 27.06.1997 






(84) 


Designated Contracting States: 

AT BE CH 0E DK ES Fl FR QB OR IE IT LI LU MC 


(72) 


Inventor: Nakhlmovsky, Gregory 
Wakefield, Massachusetts 01880 (US) 




NLPTSE 

Designated Extension States: 
AL LT LV SI 


(74) 


Representative: Abnett, Richard Charles 
REDCME & GROSE 
16 Theobalds Road 


(30) 


Priority: 28.06.1996 US 673382 




London WC1X8PL (GB) 


(71) 


Applicant: SUN MICROSYSTEMS, INC. 
Mountain View, CA 94043 (US) 







(54) Memory allocation in a multithreaded environment 



(57) A method of allocating memory ( 1 6) in a multi- 
threaded (parallel) computing environment in which 
threads (30-33) running in parallel within a process are 
associated with one of a number of memory pools (24, 
38-40) of a system memory. The method includes the 
steps of establishing memory pools in the system mem- 



ory, mapping each thread to one of the memory pools: 
and, for each thread, dynamically allocating user mem- 
ory blocks from the associated memory pool. The meth- 
od allows any existing memory management malloc 
(memory allocation) package to be converted to a mul- 
tithreaded version so that multithreaded processes are 
run with greater efficiency. 
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iabl s Identified by a thread index associated with on of th memory pools. Th data structure includes a header 
which includes the size of the memory block and th memory pool index to which it is associated. Th size of th block 
and the memory pool index may both be. for example, four bytes. 

The method may further includ the step of allowing each thread to deallocate or fre a memory block to the 
5 memory pool. The application may require that the memory block be freed from the thread which originally allocated 
the memory block. Other applications may allow the memory block to be freed from a thread which did not originally 
allocate the block. 

Coalescing or merging deallocated (or freed) memory blocks may be performed to unite smaller fragmented blocks. 
However, the method prevents coalescing of memory blocks from different pools. 

10 in the event that the size of a memory block needs to be enlarged in order to store more data elements, the size 

of an allocated block of memory allocated by a memory pool may be changed using a realloc routine. The method 
requires that realloc preserves the original memory pool. 

In general, in another aspect, the invention is a computer-readable medium storing a computer program for allo- 
cating memory in a multithreaded computing environment in which threads run in parallel within a process, each thread 

1$ having access to a system memory. The stored program includes computer-readable instructions: (1 ) which establish 
a plurality of memory pools in the system memory; (2) which map each thread to one of said plurality of memory pools; 
and (3) which, for each thread, dynamically allocate user memory blocks from the associated memory pool. A computer- 
readable medium includes any of a wide variety of memory media such as RAM or ROM memory, as well as, external 
computer-readable media, for example, a computer disk or CD ROM. A computer program may also be downloaded 

20 into a computer's temporary active storage (e.g., RAM, output buffers) over a network. For example, the above-de- 
scribed computer program may be downloaded from a Web site over the Internet into a computer's memory. Thus, th 
computer-readable medium of the invention is intended to include the computer's memory which stores the above- 
described computer program that is downloaded from a network. 

In another aspect of the invention, a system includes memory, a portion of which stores the computer program 

2S described above, a processor for executing the computer-readable instructions of the stored computer program and 
a bus connecting the memory and processor. 

Other advantages and features will become apparent from the following description of the preferred embodiment 
and from the claim. 

30 Brief Description of the Drawings 

Fig. 1 is a block diagram of a multi-processing computer system which is suitable for use with the invention. 
Fig. 2 illustrates the relationship between a multithreaded application and a shared memory. 
Fig. 3 diagrammatically illustrates a data object in memory. 
35 Fig. 4 illustrates the relationship between a multithreaded application and a shared memory in which more threads 

than memory pools exist. 

Fig. 5 is an example of an application which calls memory management functions from threads running within a 
process 

40 Description of the Preferred Embodiments 

Referring to Fig. 1 , a simplistic representation of a multi-processing network 10 includes individual processors 1 2a- 
1 2n of comparable capabilities interconnected to a system memory 1 4 through a system bus 1 6. All of the processors 
share access to the system memory as well as other I/O channels and peripheral devices (not shown). Each processor 

45 is used to execute one or more processes, for example, an application. 

Referring to Fig. 2, an application 20 which may be running on one or more of the processors I2a-i2n (Fig. 1 ) is 
shown. Application 20 includes, here, a single thread 22 which has access to a section 24 of allocated memory within 
the system memory 14. This memory section is referred to as a memory pool. The application also includes a multi- 
threaded portion shown here having four threads 30-33. Although four threads are shown, the number of threads 

so running at any given time can change since new threads may be repeatedly created and old threads destroyed during 
the execution of the application. Each of threads 30-33, for example, may run on a corresponding one of processors 
1 2a-l 2n. In other applications, all or multiple threads can run on a single processor. Thread 30 is considered to be the 
main thread which continues to use the memory section 24 allocated by the application as a single thread. However, 
additional threads 31-33 allocat th ir own m mory pools 38-40 from the syst m memory 14. Thus, each thread is 

ss associated with a memory pool for us in executing its operations. During the execution of th application running on 
th threads, each thread may be repeatedly allocating, fr eing and r allocating memory blocks from its associated 
memory pool using m mory allocation functions (i.e. . mailoc, free, realloc) which are described in greater detail below. 
Moreover, while on thread is generally designat d as the main thread, some of the remaining threads may be desig- 
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nated for particular purposes. 
Establishing Memory Pools 

5 The number of memory pools (NUM_POOLS) is fixed. Although the malloc package programmer can change the 

number of pools, the package must be rebuilt after doing so. 

Establishing a memory pool for each thread includes allocating a memory buffer of a preselected size (e.g., 64 
Kbytes). In the event that the size of the memory pool has been exhausted, the size of the memory pool may be 
dynamically increased by allocating additional memory from the system memory. The additional memory may be allo- 
10 cated, for example, using a Unix system routine called sbrk( ) which, in this implementation, is called internally from 
within malloc and allocates the additional memory in increments equal to the preselected size of the buffer memory. 
Allocating additional memory requires the pool to be locked which prevents other memory functions to be performed 
at the same time. Thus, the size of the memory buffer is selected to be large relative to the average amount of memory 
requested by mallocf ) so that calls for increasing the size of the pool are infrequent. 

Each memory pool may be set up as a binary tree data structure with individual blocks of memory comprising the 
pool. The binary tree is ordered by size, although it may be ordered by address. Other data structures (e.g., linked 
lists) may alternatively be used; however, a binary tree structure may be preferred because of the increased speed it 
offers in searching. Moreover, a balancing or self-adjusting algorithm may be used to further improve the efficiency of 
the search. 

20 Referring to Fig. 3, each block of memory 40 is identified by a data object 40 having a header 42 with a length 

consistent with the alignment requirements of the particular hardware architecture being used. For example, certain 
hardware configurations used by Sun Microsystems lnc. ; Mountain View, CA require the header to be eight bytes in 
length to provide an alignment boundary consistent with a SPARC architecture. The first four bytes of the header 
indicate the size of the block, with the remaining four bytes indicating a pool number. 

2S 

Memory Management Functions 

Each thread 20-23 allocates memory for its memory pool using a set of memory allocation routines similar to thos 
from a standard C library. The basic function for allocating memory is called malloc and has the following syntax: 

30 

void * malloc (size) 

where size indicates the number of bytes requested. 
35 Another memory allocation routine is free which releases an allocated storage block to the pool of free memory 

and has the following syntax: 

void • free (old) 

40 

where old is the pointer to the block of memory being released. 
Still another memory allocation routine is realloc which adjusts the size of the block of memory allocated by malloc. 
Reallochas the following syntax: 

45 

void * realloc (old. size) 

where: 

so old is the pointer to the block of memory whose size is being altered; and 

size is the new size of the block. 

Converting an Existing Malloc Packaae to a Multithreaded Malloc Package 

ss in order t convert an existing m mory management package which uses a single lock to a parallel memory man- 

ag ment package, all static variables used in the above d scribed memory managem nt functions are conv rted into 
static arrays. For example, the binary tr structures associated with the m mory pools are stored as a static array 
Each element of th static array is identified by its thread index and is associated with a giv n memory pool. There is 
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a separate static array element within each array for each pool. Thus, searching through the particular data structure 
(e.g., binary tre ) for each thread can b performed in parallel. 

Each thread, therefore, can repeatedly execute any of the above routines to manage memory allocation of their 
associated memory pools. For example, referring again to Fig. 2, main thread 30 may execute a procedure in which 
s memory blocks within memory pool 24 may be allocated, freed, and allocated again numerous times. Simultaneously, 
threads 31 -33 may be executing procedures in which memory is being allocated and freed from and to their respective 
memory pools 38-40. 

Mapping Threads to Memory Pools 

10 

Whenever a memory allocation function is called, a thread-identifying routine within each one of these functions 
is used to identify the thread making the memory allocation request. The thread-identifying function returns the thread 
index of the thread making the request. For example, the Solaris Operating System (OS), a product of Sun Microsystems 
Inc., uses in one implementation a function called thyself ( ). 
is Another algorithm is then used to map each thread index to a memory pool number. For example, the described 

embodiment uses the following macro known as GET_THREADJNDEX which receives the thread index and returns 
an associated pool number: 

20 # define GETJTHREADJNDEX(seif) \ 

((self) = 1 ? 0 : 1 + ((setf)-4 % (NUM_POOLS-1) 

where: 

2S 

self is the thread index; and 

NUN_POOLS is the number of memory pools. 

As mentioned above, one thread is generally designated as the main thread with remaining threads designated for 

30 other purposes. For example, the SOLARIS OS uses a thread numbering system which reserves the first thread as a 
main thread, the second and third threads as system threads and subsequent threads as user threads. With the above 
macro, the memory pools are numbered 0 to NUM_POOLS-1. The first portion of the above macro (self == 1 ? 0) 
ensures that the main thread is always associated with the first pool number. Thus, if self is equal to 1 (i.e.. it is the 
main thread), then the pool number is 0. Otherwise, as shown in the remaining portion of the macro after the V, the 

3S remainder of the ratio of the thread index minus the constant four to the NUM_POOLS-1 is then added to the number 
1 to arrive at the pool number. For example, if there are only four memory pools (i.e., NUN_POOLS = 4) and the thread 
index is 4. the associated pool number returned by the macro is 1. Thread indices of 5 and 6 would have associated 
memory pools numbered 2 and 3, respectively. 

In applications in which the number of threads existing at any given time exceeds the number of established pools. 

40 the additional threads share memory pools with another thread associated with that pool. Referring to Fig. 4. for ex- 
ample, an application is shown in which a new fifth thread 34 has been created. Because only four memory pools were 
established, the above mentioned macro is used to map thread 34 to first memory pool 24 originally associated with 
only thread 30. In this situation, the mutex lock associated with memory pool 24 prevents access by either thread 24 
or 34. if the other is using the pool. In the example of the preceding paragraph, macro GET_THREADJNDEX would 

*s map threads having thread indices of 4 and 7 to memory pool #1 . 

Code-Locking Memory Pools 

Each memory pool 24 and 38-10 is protected by its own mutual exclusion (mutex) lock. Like the data structures 
so associated with each memory pool, mutex locks are stored in a static array. Each mutex lock causes no delay in a 
thread that is allocating, deallocating or reallocating one or more memory blocks from its own memory pool. However, 
when a thread not associated with a particular memory pool attempts to access a memory block already allocated by 
the thread associated with that pool, the mutex lock prevents the non -associated thread from deallocating or reallocating 
a memory block from that pool. Thus, th lock protects the m mory blocks from being updated or used by more than 
55 one thread at a time, th reby pr v nting th corruption of data in the mem ry block. Such attempts to allocate, deal- 
locate or reaiiocat memory blocks from a m mory pool not associated with a thread ar relatively infrequent. This 
feature provid s a substantial improvement in th speed performance of th system over conventional schemes in 
which a single mutex lock is used for all memory management routines. Using a single mutex lock can significantly 
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degrade the performanc of a multithreaded application. With this approach, one a thread makes a memory manage- 
ment function call (i. .. malkx, free, or realkx) all other threads must wait until the thr ad has finished performing its 
memory management function. By providing separate mutex locks for each memory pool, each thread can, in parallel, 
altocat and free its own memory within its own memory pool while preventing access from non-associated threads. 
5 As memory blocks are repeatedly allocated, freed and reallocated by a thread, the memory pool may become 

fragmented into smaller and smaller blocks. Coalescing or merging of freed memory blocks which are contiguous is 
periodically performed to form larger memory blocks which can be reallocated by the thread. However, before a memory 
block can be coalesced with an adjacent memory block, the described embodiment first determines whether the blocks 
are form the same pool. If not, the blocks are not coalesced, thus avoiding the possibility of data corruption. 

10 

Merge Malkx Pools 

The extent to which the individual threads use memory management may vary significantly For example, referring 
again to Fig. 2, threads 31 -33 may complete their tasks prior to the completion of the tasks performed by main thread 

is 30. In such situations, the main thread may call an optional interface function which transfers the memory allocated 
by threads 31 -33 to the main thread 30. In other words, the function may be called by the main thread at the end of 
the multithreaded portion to consolidate to the main thread the memory previously allocated by the other threads. Th 
routine used in this embodiment has the following prototype: 
void merge_ malloc_poofs (void); 

20 The use of this function may not be needed in applications in which the multiple threads perform significant memory 
management throughout the application. 

Referring to Fig. 5, a simplistic representation of an application is shown running within main thread 30 and user 
thread 31. It is assumed here that memory pools 24 and 38 (Fig. 2) which are associated with threads 30 and 31. 
respectively, have already been established With respect to main thread 30, a first ma//oc routine call 50 is mad 

25 requesting a block of memory having SI2E#1 bytes. Later in the application, a first free routine call 52 is made to return 
a block of memory identified by pointer OLD. At this time, coalescing is generally performed to combine the returned 
block of memory with an adjacent block, so long as they are both from the same memory pool. Still later in the thread, 
a second maitoc routine call 54 is made requesting a block of memory having SIZE #2 bytes. A realtoc call 56 requesting 
that a block of memory identified by pointer OLD be resized to SIZE#3 bytes follows. Thread 31 is shown executing 

30 procedures concurrently with thread 30. For example, a first malloc routine call 60 is made followed sometime later by 
a first free routine call 62. Finally, in this example, after completion of the multithreaded portion of the application, a 
merge_mailoc_pools routine 64 is called to consolidate memory blocks allocated by thread 31 to the main thread 30. 

Attached as an Appendix is source code software for one implementation of a method of converting an existing 
malloc package to a multithreaded version of a malkx package. The source code represents a version of the program 

35 based on the set of memory allocation routines described in The C programming language , 8.W. Kernighan and D.M. 
Richie, Prentice Hall (1988). 

Other embodiments are within the following claims. 
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^PPEMPIX 

/•A multithreaded (MT-hotl version of malloc and friends. 
Based oa the malloc package from 

the Kemighan 6 Ritchie ANSI C book (page IBS) , with modifications. 
By Greg Nekhimovsky 
Sun Microsystems. Inc. 
Market Development Engineering 
January 1994 

Changes from the original KSA version: 

• All malloc routines are made HT-saf«. 

• reallocO is added. 

• A separate free pool is created for each thread up to NUM_?0OLS. 
NUM POOLS is currently set to 4 but can be adjusted. 

• The~pool number is scored in the same header. 

• Increased header site to 15 bytes to make room for pool number* 

• each pool is protected by its own mutex lock. 

• freeO returns the freed block to the pool the block was malloc' ed from. 

• reallocO modifies the block in the original pool. 

• Coalescing (merging) is only done within the same pool. 

• A new routine is added for external interface: merge Mnalloc^pools () . 
If called by the application from the main thread after the threaded 
section is over* it transfers all memory bloeks from the additional pools 
back to the main pool. This call will eliminate a "memory leak' , in a 
sense that the main thread can reuse memory used by other threads. 

• To reduce additional fragmentation, the default block sixe for sbrkO 
is increased from 8R to *4K. 

When multiple treads allocate and deallocate their own memory, they don't 
wait on their own locks, when one thread tries to free or reallocate memory 
allocated by another thread, the lock protects the free list from being 
updated or used by more than one thread at a time. The lock is also used when 
there are more than NUM POOLS threads active at the same time. 



•include <stdio.h> 
•include <thread.h> 
•include « synch. h> 

/• number of pools for different threads •/ 
•define KUM.POOLS 4 

/♦ minimum number of 16-byte units to request from system • C4X in this case •/ 
•define KAXLOC 409* 

•define MAGIC 0x5555 /• to check integrity of pointers to free, realioc •/ 
•ifdef RJCZOTRAKT /• will not compile without ^REENTRANT defined •/ 
static mutex t ^pool locks (HUM_POOLS) ; 
•endif /♦ — REENTRANT V 

•define GETjniB£A&_XSCZ3C (self) ((self) — 1) ? 0 : 1 ♦ ( (Self ) -4 ) % <NDM_POOLS-l) 

•ifndef null 
•define NULL (0) 
•endif 



typedef double Align; 

/* increaded the header size to 16 bytes 
union header { 

struct { 

union header •ptr; /• 
unsigned size; /* 
unsigned pool; /• 
unsigned magic; /* 

) »; 

Align x[2] ; 



cat ♦/ 



next block if on the free list •/ 
size of this block •/ 
pool number •/ 

for checking - using 4 extra bytes 



/• need 1$ bytes because of the pool number */ 
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typedef union header Header; 

static Header baee fNUM^POOLS) ; /* empty lists to gat started •/ 
/• starts of free U»ts •/ 

static Header ♦freep(NUM_POOLSj - {NULL, NOLL, NULL, NULL}; 

static void *mall©c^loeked (unsigned nbytes, unsigned thread_indejc) ; 
static void free_loeked(void *ap, unsigned t hread_ index) ~" 
static Header ♦raorecore (unsigned nunits, unsigned thxead - index) ; 

void ♦malioc (unsigned nbytea) 

{ 

void *ret; 

unsigned self, thr eed_index ; 

'5 self - tfar aalSO; 

thread index - GET THREAD INDEX (self); 

~ \ 
rautexJLocx ( a jpool_locxa ( thread^index] ) ; 
ret -~malloc_locxed(nbytea # threed_index) ; 
mutex^unlocJcIa jool^loexs (tnread_iadaxj ) ; 
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return ret? 



static void ♦raelloc^locJced (unsigned nbytes, unsigned tnread_index) 

Header *p, *prevp; 
unsigned nunits ; 



nunits • (nbytee+eirecf (Header) -1) /aixeof (Header) ♦ 1; 
if ((prevp • f reap Cthread^indax] ) — HULL) { /♦ no free list yet */ 
baee (tnread_index) .s.ptr - freep [tfareed_index) - 

prevp • these (thread_index) ; 
base (thxead_indexl .a. size • 0; 
base { thread_index] .a. pool « thread_index; 
base (thread^index) . s . magic • MAGXC? 

for (p - prevpos.ptr; ; prevp - p, p • p->e.ptr) { 
J5 if (p->e.sire >• nunits) { /* big enough ♦/ 

if (p->s.sixe nunits) /♦ exactly •/ 

prevp->s.ptr • p->e.ptr; 

else { 

p~>e.size nunita; 
P ♦« p->a.si*e* 
p->e.sixe - nunits ; 
p->s.pool m thread index; 
p->s. magic • MAaicT 

freep (thready index) - prevp; 
Al . return (void**) <p*l) ; 

} 

if <p mm freep{thread_index)) /♦ wrapped around free list ♦/ 
if ( (p - raorecore (nunits , thread_index) ) NULL) 
return NULL; /* none left */ 



static Header •morecore (unsigned nu, unsigned thread_index) 

char *cp, •sbrk(int); 
Header *up; 

if (nu < NALLOC) 
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nu - NALLOC; 

cp - sbrkinu • sizeof (Header) ) ; /♦ abrkO assumed locked - GM •/ 
if (cp — (char •) -l) /* no space at all ♦/ 
return NULL; 
5 up ■ (Header •) cp; 

up* >s. size ■ nu; 

up- >a. pool - threed_index; 

up- >s. magic • MASIC; 

free_locked( (void •Kup+l), thread_index) ; 
recurn f reep Cthread_index] ; 

10 ) 

void free (void *ap) 

{ 

Header *bp; 

unsigned thxead_index; 

15 int i; 

if (ap HULL) 

return; N . 
/• free the block of the thread which allocated it •/ 
bp - (Header *)ap - 1; /* point to block header ♦/ 
20 if (bp- >a. magic !- MAGIC) { 

printf ("bogus pointer %x passed to free()\n", ap) ; 

abort*); 

thread_index • bp- >a. pool; 

25 mutex_lock ( fi_pool_loeka ( thr ead_index) ) ; 

free locked (ap. thread_index) ; 
^ tsutex^unlock ( ft jpool_locks [ threed Mn d ex ) ) ; 

•tatic void free locked (void *ap, unsigned thread index) 

30 { 

Header *bp, *p; 
int i; 

bp • (Header *)ap - 1; /• point to block header */ 

for (p m fre*p(thre*d_index) ; I (bp > p aft bp < p->e.ptr>; p - p->s.ptr 
if (p >• p->a.ptr 6ft (bp > p 1 1 bp < p->e.pcr)) 

break; /• freed block at start or end of arena •/ 

if (bp * bp- >s. size p-»e.ptr ftft 

bp->s.pool p->s.pool) { /• join to upper nbr ♦ / 
bp- >s. site *• p- >s. per- >s. size; 
bp->s.ptr ■ p->s.ptr->s.ptr; 

} else 

bp->s.?tr * p->s.pcr; 
if (p ♦ p->s.size bp ftft 

p->s.pool bp- >s. pool) { /♦ join to lower nbr •/ 
p->s.size bp- >s. size; 
p->s.ptr • bp->s.ptr; 

} else 

p->s.ptr - bp; 
£reep(thread_indexl • p; 
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vcid ♦realloc(void •old, unsigned nbytes) 

/• Added by Gtf, since realloco is not given in the KftR book ♦/ 

void »new; 
55 Heeder •bp; 

unsigned self, thread_index, ncopy; 



EP 0 817 044 A2 



10 



if (Old mm MULL) 

return malloc (nbytes) ; 
self - thr selff); 

thread_index - GETjrHR£AI>_ INDEX (self ) ; 

bp - (Header +)old -1; /• point to block header */ 
if {bp- >•. magic !- MAGIC) { 

printf ( "bogus pointer \x passed to realloc\n°, old); 
abort ( ) ; 

} 

if (bp- >s. pool !■ thraad_index) 
thread_index - bp->a7pool; 
mutex^ockT&^pool^ocks [thread_index) ) ; * 

is /• for simplicity, 

always allocate a new block , copy and free the old one •/ 
if ((new - malloc^locked (nbytes, thread_index) ) NULL) { 
trajtex_unlock (*jpool_locks Cthread_indax] ) - 
return NULL; ~ \ 

) 
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if (nbytes > 0) ( 

ncopy - sizeof (Header) * (bp->e.aize - l) j 
if (ncopy > nbytes) 

ncopy - nbytes; 
meacpy<new, old, ncopy); 

free_locked(old, chread_index) ; 
rmitex_unlock (a jool^locka (thread_index] ) ; 
return new; 



/* New externally called function merging all additional pools into 
the main thread's pool. Should only be called from the main thread. 
Assumes that only the main thread is active. 

•/ 

void merge malloc _pools(void) 

55 { 

int i; 

Header *p, *prevp; 
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/♦ skip the main thread's pool (0) •/ 
for (i«l; i<NUM_FOOLS; { 
prevp . - freep(ij; 
if (prevp tm NULL) { 

for (p - prevp->s.ptr; ; prevp . p, p . p->s.ptr> { 
if(p->s.si2e > 0) { 

p->s.pool - 0; 

/♦no need to lock - main thread only 
j free_locked( (void ♦) (p+i) , 0) ; 

if (p « freepCi]) /• end of list •/ 
break; 

} 

so freepCi) - NULL; 



55 



10 



EP 0 817 044 A2 



Claims 

1. A method of allocating memory in a multithreaded computing environment in which a plurality of threads run in 
parall I within a process, each thread having access to a system memory, the method comprising: 

5 

establishing a plurality of memory pools in the system memory; 
mapping each thread to one of said plurality of memory pools; and 

for each thread, dynamically allocating user memory blocks from the associated memory pool. 

10 2. The method of claim 1 wherein the step of dynamically allocating memory blocks includes designating the number 
of bytes in the block desired to be allocated. 

3. The method of claim 1 further comprising the step of preventing simultaneous access to a memory pool by different 
threads. 

15 

4. The method of claim 1 further comprising the step of establishing a memory pool for each thread comprises allo- 
cating a memory buffer of a preselected size. 

5. The method of claim 4 further comprising the step of dynamically increasing the size of the memory pool by allo- 
20 eating additional memory from the system memory in increments equal to the preselected size of the buffer memory. 

6. The method of claim 4 wherein the preselected size of the buffer is 64 Kbytes. 

7. The method of claim 1 further comprising the step of one of the threads transferring memory from the memory 
25 pool of another of the threads to its memory pool. 

8. The method of claim 1 wherein each memory pool is defined by an array of static variables identified by a thread 
index associated with a memory pool. 

30 9. The method of claim S wherein each memory pool is maintained as a data structure of memory blocks. 

10. The method of claim 9 wherein each memory block comprises a header including the size of the memory block 
and the memory pool index to which it is associated. s 

35 11. The method of claim 10 wherein the size of the block and the memory pool index are each four bytes. 

12. The method of claim 1 further comprising the step of each thread deallocating a memory btockto the memory pool. 

13. The method of claim 12 wherein the thread originally allocating the memory block deallocates it to its associated 
40 memory pool. 

14. The method of claim 12 further comprising the step of coalescing deallocated memory blocks and preventing 
coalescing of memory blocks from different pools. 

45 is. The method of claim 1 further comprising the step of changing the size of an allocated block of memory allocated 
by a memory pool. 

16. A computer-readable medium storing a computer program which is executable on a computer including a memory, 
the computer program for allocating memory in a multithreaded computing environment in which a plurality of 

so threads run in parallel within a process, each thread having access to a system memory, the stored program 

comprising: 

computer-readable instructions which establish a plurality f memory pools in th system memory; 
comput r-readable instructi ns which map each thread t one of said plurality of m rnory pools; and 
ss computer-readable instructions which, for each thr ad, dynamically allocate user m m ry blocks from the 

associated m mory pool. 

17. Th computer-readabl medium of claim 16 wherein the stored program further comprises computer instructions 
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which prev nt simultaneous acc ss to a memory pool by differ nt threads. 

18. Th computer-readabl medium of claim 16 wherein the stored program further comprises computer instructions 
which causes one of the threads to transfer memory from the memory pool of another of the threads to its memory 
pool. 

19. The computer-readable medium of claim 16 wherein each memory pool is defined by an array of static variables 
identified by a thread index associated with a memory pool. 

20. The computer-readable medium of claim 16 wherein the stored program further comprises computer instructions 
which coalesces deallocated memory blocks and prevents coalescing of memory blocks from different pools. 

21. A system comprising: 

memory, a portion of said memory storing a computer program for allocating memory in a multithreaded com- 
puting environment in which a plurality of threads run in parallel within a process, each thread having access 
to the memory, the stored program comprising: 

computer-readable instructions which establish a plurality of memory pools in the memory; 
computer-readable instructions which map each thread to one of said plurality of memory pools; and 
computer-readable instructions which, for each thread, dynamically allocate user memory blocks from th 
associated memory pool; 

a processor to execute said computer-readable instructions; and 
a bus connecting the memory to the processor. 
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